Metagenomic sequencing and reconstruction of 82 microbial genomes from barley seed communities

Barley (Hordeum vulgare) is essential to global food systems and the brewing industry. Its physiological traits and microbial communities determine malt quality. Although microbes influence barley from seed health to fermentation, there is a gap in metagenomic insights during seed storage. Crucially, elucidating the changes in microbial composition associated with barley seeds is imperative for understanding how these fluctuations can impact seed health and ultimately, influence both agricultural yield and quality of barley-derived products. Whole metagenomes were sequenced from eight barley seed samples obtained at different storage time points from harvest to nine months. After binning, 82 metagenome-assembled genomes (MAGs) belonging to 26 distinct bacterial genera were assembled, with a substantial proportion of potential novel species. Most of our MAG dataset (61%) showed over 90% genome completeness. This pioneering barley seed microbial genome retrieval provides insights into species diversity and structure, laying the groundwork for understanding barley seed microbiome interactions at the genome level.


Background & Summary
Seed microbiomes are essential to plant health, growth, and resilience, and play an important role in the physiological processes required for effective crop development 1 .The barley seed microbiome, in particular, is of critical importance, influencing not only crop yield but also the quality of barley-derived products 2,3 .Barley (Hordeum vulgare) has been integral to agriculture since the early phases of human civilization 4 .Its significance in the modern era is two-fold: as a fundamental component of the global food system, and as a crucial ingredient in the brewing industry 3,5 .While the physiological attributes of barley influence malt quality, the microbial communities associated with barley also play an essential role, from sowing to malting 2 .
Malting barley seeds are colonised by rich and diverse microbial communities, encompassing both endophytic and epiphytic organisms 1,6,7 .These microorganisms, which can be both beneficial and detrimental, have the potential to affect seed health, germination success, and the quality of fermentation products [8][9][10] .Several studies highlight the diversity of microbial populations associated with malting barley and their potential effects on brewing product quality 8,11,12 .Understanding these microbial communities and their genomic content can provide insights into seed storage longevity, contamination risks, and their potential impact on subsequent production stages.However, there is a notable gap in comprehensive metagenomic datasets focusing on these microbial communities, especially during the seed storage phase.
Metagenome sequencing can provide profound insights into microbial ecosystems without necessitating laboratory cultivation [13][14][15] .This approach not only provides a comprehensive understanding of the taxonomic and functional variations among phytomicrobial communities, but also sheds light on the complex interrelationships across these communities and their plant hosts 16,17 .In the context of barley seed storage, acquiring this understanding using omics paves the way for developing microbial management strategies, optimising storage conditions, mitigating losses, and ensuring consistent production of premium malt.
Whole metagenomes were sequenced from eight samples of barley seeds stored in siloes at four different time points (two samples per time point), namely at harvest and after three, six and nine months, respectively (Table S1).
The metagenomic data was assembled into nearly complete microbial genomes.A total of 82 metagenome-assembled genomes (MAGs) were assembled from these metagenomes (Table S2).The completeness of the MAGs was evaluated using CheckM v1.2.2 18 .All MAGs demonstrated completeness >75%, with 50/82 being >90% complete.These completeness values are in alignment with the high-quality draft criterion of the Minimum Information about a Metagenome-Assembled Genome (MIMAG) standards for Bacteria and Archaea 19 (Fig. 1, Table S2).
Furthermore, minimal levels of sequence heterogeneity were observed for all 82 MAGs.Approximately 91% (75/82) of the MAGs registered contamination levels <5%, whereas the remaining seven MAGS exhibited contaminant levels between 5 and 10%, ensuring the reliability and integrity of our dataset (Fig. 1 and Table S2).We identified a notable negative correlation between genome completeness and contamination (r = −0.498,p < 0.00001; Fig. 2A).In parallel, our data demonstrated a positive relationship between genome size and the N50 metric (r = 0.251, p = 0.023; Fig. 2B), indicating that larger genomes are often associated with superior assembly contiguity.
Taxonomic evaluation using the Genome Taxonomy Database Toolkit (GTDB-Tk) 20 revealed that the barley-associated MAG dataset was dominated by members of the phylum Pseudomonadota (formerly the Proteobacteria), comprising 53.7% (44/82) of the total MAGs (Table S2) This is consistent with the findings from a previous amplicon sequencing-based study of barley seed endophytic microbial communities 7 .However, in contrast to the previous findings, we identified Bacteroidota (16/82) as the second most prevalent phylum.The abundances of Actinobacteria and Bacillota (Firmicutes) in our study also differed from those previously reported 7 , underscoring the inherent variability of barley seed microbiomes (Fig. 1 and Table S2).
Temporal shifts in genera abundance over nine months.The barley-seed derived MAGs were classified into 26 bacterial genera across eight phyla and six classes (Table S2).The microbiome was characterised by  several dominant genera, with thirteen, nine, seven and six MAGs belonging to the genera Erwinia, Pseudomonas, Chryseobacterium and Paenibacillus, respectively (Fig. 3).Notably, 16 MAGs could not be accurately classified at the species level, highlighting the underexplored microbial diversity associated with barley seeds (Fig. 4, Table S2).
The barley seed microbiome shows discernible shifts during storage (Fig. 5).While the genera Erwinia and Duffyella remain pertinent from harvest through prolonged storage, there is a notable downshift and upshift in the presence of genera Chryseobacterium and Pseudomonas_E, respectively, during silo storage.These shifts may provide insights into the role of the barley seed microbiome in both seed health and disease.Chryseobacterium sp. have been observed to counteract the effects of Magnaporthe oryzae, a cause of barley blast disease, primarily by detaching fungal spores from leaf surfaces 21 , and may contribute to maintaining seed health in the Fig. 3 Genomic Metrics of the identified Bacterial Genera.field.Duffyella also garnered interest due to its observed ability to curb the growth of Fusarium tricinctum, another pathogen affecting barley 22,23 .All Erwinia MAGs identified in the study were classified in the species E. persicina, a known broad host range phytopathogen, which has been linked to pink seed disease in barley 24 .Pseudomonas-like taxa in this study were classified as part of the novel genus Pseudomonas_E as predicted by the GTDB classification database 20 .

Methods
Sample collection and processing.Malting barley (Hordeum vulgare) samples, of a single cultivar (Kadie), were sourced from Anheuser-Busch InBev (AB-Inbev) in South Africa., specifically from Storage facilities in the Western Cape province, South Africa, were selected.Samples were collected at four distinct time points: immediately post-harvest and then after three, six, and nine months of storage in silos.At each time point, three samples were collected.All samples were aseptically collected and stored at −20 °C to inhibit microbial growth.
DNA isolation and sequencing.Approximately 10 g of barley was crushed using a sterilised mortar and pestle.The resulting residue was suspended in 40 ml of phosphate buffered saline (PBS) solution (pH 7.4).The suspension was briefly vortexed to homogenise the mixture, followed by sonication at 18 W amplitude with a 30-s on-off pulsating schedule for 7 min.The mixture was centrifuged at 4000 × g for 1 min to separate the supernatant, which was transferred to an autoclaved polycarbonate filter holder and filter membrane (0.45 µm pore filter, Sartorius-Stedim Biotech) prepared filter membrane system.
Metagenomic DNA was extracted from the filter using the ZymoBIOMICS DNA/RNA Miniprep Kit (Zymo Research), following the protocol recommended by the manufacturer.A Nanodrop Lite Spectrophotometer (Thermo Fisher Scientific) was used to validate the integrity and purity and quantify the DNA.The metagenomic DNA samples were sequenced using the Illumina NovaSeq.6000 platform (paired end reads, 2 × 250 bp) at Molecular Research (MRDNA, Texas, USA).The total number of reads obtained was approximately 365.27 million.On average, each sample yielded around 22.83 million reads, with the maximum number of reads for a single sample being approximately 38.26 million and the minimum around 10.36 million.These metrics provide an overview of the sequencing depth achieved in our study.For a detailed breakdown of read counts for each sample (Table S1).

Metagenomic data analysis.
Raw sequence reads were evaluated for quality using FastQC v0.12.1 25 and MultiQC v1.15 26 .Trimmomatic V0.36 27 was used to filter out reads shorter than 36 bp or with an average quality score lower than 15.The removal of host DNA was performed using Bowtie2 v2.5.1 28 and SAMtools v1.19 29 .Initially, an index database employing the reference genome of barley (Hordeum vulgare, Accession number: GCF_904849725.1) was constructed using the bowtie2-build command.Subsequently, read mapping to the host sequence database with Bowtie2 was conducted, preserving both aligned and unaligned paired end reads.Following this, SAMtools was used to convert the sam file into a bam format.The required unmapped reads were precisely isolated by applying SAMtools SAM-flag filters (-f 12 and -F 256), which selected pairs where both reads (R1 and R2) were unmapped.Finally, the SAMtools sort and SAMtools fastq commands were used to separate the paired end reads into distinct fastq files.Host DNA contamination varied across samples with the mean contamination ratio was approximately 0.5757%, with the minimum at 0.0059% (3,088 contaminated reads out of 52,678,404) and the maximum at 2.7368% (567,134 contaminated reads out of 20,155,530) (Table S1).Thereafter, the reads were then assembled using metaSPAdes v3.15.3 30 with default parameters.The integrity and quality of the final assemblies were evaluated using QUAST v5.2.0 31 .

phylogenetic analysis and classification of MAGS.
For taxonomic assignment of MAGs, the classify_ wf workflow from GTDB-Tk v3.4.2 20 was employed in tandem with the reference data GTDB release207v2 20 , all executed with default settings.A comprehensive phylogenetic tree encompassing 82 species-level bacterial MAGs was derived from 120 bacterial marker genes using the gtdbtk_infer module in GTDB-TK.To improve interpretation and visualisation, the tree was annotated using iTOL v5 37 .

Data records
The data records are available Figshare 38 .
The 82 MAGs have been deposited at DDBJ/ENA/GenBank under the accession numbers listed in Table 1   .
Additional metadata and details about each MAGs are available in the Supplementary Table S2.
The raw reads used to reconstruct the MAGs have been deposited to the NCBI Sequence Read Archive 120 .

Technical Validation
Implementation of robust software applications, such as FastQC, MultiQC, and Trimmomatic, all of which were designed to curate and refine the sequence data.Combining the comprehensive MetaWRAP pipeline with dependable tools such as CheckM and GTDB-tk strengthened the binning, genome assembly, and

Table 1 .
Genomic characteristics and accession numbers of 82 microbial genomes from barley seed communities described in this study.
Continued taxonomic assignment processes.The culmination of these exhaustive validation stages is a dataset that is not only technically sound, but also a model of dependability and reproducibility in metagenomic research.